Twitter Text Analytics¶

alt text

Muralidhar Reddy Reddem U64546777¶

API intro¶

ISM6564 Fall 2023

© 2023 Murali


Use screen scraping to collect the speeches of all US Presidents, available from the Miller Center's website (https://millercenter.org/the-presidency/presidential-speechesLinks to an external site.). Add to the corpus the year of each speech, each president’s party affiliation (Democrat, Republican, or Other), and the start and end dates of the president's term (find a website with this information, and use screen scraping to extract the relevant information). Place all fields (including the speech content, party affiliation, and start and end dates) in a CSV file.

Analyze the content of your csv file to answer the following questions.

  1. Which president has the most vocabulary, as evident from their inaugural speeches, and which president has the least vocabulary? On average, do Democratic, Republican, or Other presidents have a higher vocabulary? (2 points)

  2. Create a barplot of presidential vocabulary from the earliest president (Washington) to the latest (Biden) in chronological order. Color code this barplot as blue for Democrat, red for Republican, and gray for Others. (1 point)

  3. What are the five most frequently used words (exclusive of stop words) used by each president? What are the five most frequently words used collectively by all Democratic presidents versus Republican presidents? (2 point)

  4. What are the key themes (e.g., freedom, liberty, country, etc.) used by each president in their inaugural speech? (3 points)

  5. Compute a sentiment (positive/negative) for each presidential speech, and draw a barplot of the sentiment of all presidential speeches in chronological order. Again, color code the speeches as blue for Democrat, red for Republican, and gray for Other. Which of these groups have higher mean sentiment score? Who are the top three presidents with the highest positive sentiment in each group? (2 points)

NOTE1: To receive any marks, you must submit the working code that scrapes and assembles the csv data and analyzes the data to answer the questions above, and the resulting csv file. You must also submit an html export of your notebook that shows that you have successfully run this notebook.

NOTE2: Points will be deducted for copy-and-paste code from the class examples without thinking about their appropriateness for the assignment. Your code must be compact, free of errors, without unnecessary details not asked in the question, using functions and loops as appropriate, and using some comment statements. You will lose points if you fail to adhere to these common coding expectations.

import required packages¶

In [2]:
import pandas as pd
import time

from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By

from webdriver_manager.firefox import GeckoDriverManager 

from bs4 import BeautifulSoup as bs

import re
import dateparser




import numpy as np
from matplotlib import pyplot as plt
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from nltk.util import ngrams

from wordcloud import WordCloud
from PIL import Image # used for opening image for masking wordcloud # you need to install Pillow package

import nltk
nltk.download('punkt') # sentance tokenizer
nltk.download('stopwords')
nltk.download('wordnet') # WordNet is a lexical database for the English language - used to find the lemma of a word

nltk.download('vader_lexicon') # Valence Aware Dictionary and sEntiment Reasoner
from nltk.sentiment.vader import SentimentIntensityAnalyzer

from collections import Counter
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\rmura\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\rmura\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\rmura\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\rmura\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!

open url in browser ,load the total website & after loading website fully close the loaded url broswer¶

In [2]:
# Start a driver session....


# if you hve selenium 4 installed, use one of these:
driver = webdriver.Firefox(service=Service(GeckoDriverManager().install())) # this will work on Windows and Mac, and should work on Linux when run the first time
#driver = webdriver.Firefox() # use if geckodriver is in your PATH environmnet variable (which includes the same folder as your notebook)

# load page with Selenium
driver.get("https://millercenter.org/the-presidency/presidential-speeches")
driver.implicitly_wait(10) # implicitly_wait method sets a sticky timeout to implicitly wait for an element to be found, or a command to complete. This method only 
# needs to be called one time per session. 

pause_scroll = 3 # we need to pause after each time we scroll down
previous_page_height = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(pause_scroll)
    new_page_height = driver.execute_script("return document.body.scrollHeight")
    if new_page_height == previous_page_height:
        break
    previous_page_height = new_page_height
    

page_source = driver.page_source
driver.close()
In [3]:
! pip install lxml
Requirement already satisfied: lxml in c:\users\rmura\anaconda3\envs\text_analytics\lib\site-packages (4.9.2)
In [4]:
#retrieve urls to all speeches
bsobject_linkpage = bs(page_source,'lxml')
bs_links = bsobject_linkpage.find_all("a", href = re.compile('presidential-speeches/'))
bs_links[-1] # display the first 5
Out[4]:
<a href="https://millercenter.org/the-presidency/presidential-speeches/april-30-1789-first-inaugural-address" target="_blank">April 30, 1789: First Inaugural Address</a>
In [5]:
speech_link_list = []
for link in bs_links:   
    speech_link_list.append(link['href'])

speech_link_list[:5] # display first 5
Out[5]:
['https://millercenter.org/the-presidency/presidential-speeches/february-21-2023-remarks-one-year-anniversary-ukraine-war',
 'https://millercenter.org/the-presidency/presidential-speeches/february-7-2023-state-union-address',
 'https://millercenter.org/the-presidency/presidential-speeches/september-21-2022-speech-77th-session-united-nations-general',
 'https://millercenter.org/the-presidency/presidential-speeches/september-1-2022-remarks-continued-battle-soul-nation',
 'https://millercenter.org/the-presidency/presidential-speeches/may-24-2022-remarks-school-shooting-uvalde-texas']
In [7]:
%%time
# looking at html content...
# there is a class called president-name, episode-date, speed-loc, about-sidebar--intro, 
# presidential-speeches--title, presidential-speeches--title, view-transcript 
# 
# view-transcript content may have multiple "Transcript" Headers (header 3)
# it will also include a title ending in a colon
# 
#scrape the speech#

driver = webdriver.Firefox(service=Service(GeckoDriverManager().install())) # start a new session

pause_between_pages = 2

# create empty lists to store data from each page
title, speech, name, date, about = ([] for i in range(5))

for link in speech_link_list:
    #access speech page with Selenium and find div class "transcript-inner"
    driver.get(link)

    # use beautiful soup to parse the html
    bsobject_speechpage = bs(driver.page_source, 'lxml')

    #scrape speech test, tital, presidents name, date of speech and text about the speech.
    try:
        title.append(bsobject_speechpage.find('h2', class_="presidential-speeches--title").text.strip())
    except:
        title.append("No title available")
        
    try:
        speech_raw = bsobject_speechpage.find('div', class_="transcript-inner").text.strip().replace('\xa0', '')
        speech.append(re.sub(r"Transcript|\n","",speech_raw)) 
    except:
        try: # older links use the class view-transcript instead of transcript-inner; if transcript-inner doesn't work, thy view-transcript
            speech_raw = bsobject_speechpage.find('div', class_="view-transcript").text.strip().replace('\xa0', '')
            speech.append(re.sub(r"Transcript|\n"," ",speech_raw)) 
        except:
            speech.append("No speech available")
    
    try:
        name.append(bsobject_speechpage.find('p', class_="president-name").text.strip())
    except:
        name.append("No name available")
    
    try:
        date.append(dateparser.parse(bsobject_speechpage.find('p', class_="episode-date").text.strip()))
    except:
        date.append("No date available")
        
    try:
        about.append(bsobject_speechpage.find('div', class_="about-sidebar--intro").text.strip())
    except:
        about.append("No info available")
    
    # pause before getting next page
    time.sleep(pause_between_pages)

driver.close()
CPU times: total: 19.8 s
Wall time: 59min 9s
In [8]:
#save this to a dataframe and save to a csv file#
if len(title) == len(speech) == len(name) == len(date) == len(about):
    speeches_presidents = pd.DataFrame({'name':name,'title':title,'date':date,'about':about,'speech':speech})
    speeches_presidents['speech'] = speeches_presidents['speech'].apply(lambda x: x.replace(".",". "))
    speeches_presidents.to_csv("./data/presidential_speeches.csv",index=False)
else:
    print("Something went wrong with scraping the speeches. Please check the code.")
    
    # dump the data to csv files for debugging
    df_names = pd.DataFrame({'name':name}) 
    df_names.to_csv("./data/names.csv",index=False)
    
    df_titles = pd.DataFrame({'title':title})
    df_titles.to_csv("./data/titles.csv",index=False)
    
    df_dates = pd.DataFrame({'date':date})
    df_dates.to_csv("./data/dates.csv",index=False)
    
    df_infos = pd.DataFrame({'about':about})
    df_infos.to_csv("./data/about.csv",index=False)
    
    df_speeches = pd.DataFrame({'speech':speech})
    df_speeches.to_csv("./data/speeches.csv",index=False)
In [10]:
speeches_presidents.head()
Out[10]:
name title date about speech
0 Joe Biden February 21, 2023: Remarks on the One-Year Ann... 2023-02-21 Speaking at the Royal Castle in Warsaw, Poland... THE PRESIDENT: Hello, Poland! One of our great...
1 Joe Biden February 7, 2023: State of the Union Address 2023-02-07 In his State of the Union Address, President J... Mr. Speaker. Madam Vice President. Our Firs...
2 Joe Biden September 21, 2022: Speech before the 77th Ses... 2022-09-21 President Joe Biden addresses the 77th session... Thank you. Mr. President, Mr. Secretary-Gene...
3 Joe Biden September 1, 2022: Remarks on the Continued Ba... 2022-09-01 President Joe Biden speaks in Philadelphia, Pe... THE PRESIDENT: My fellow Americans, please, if...
4 Joe Biden May 24, 2022: Remarks on School Shooting in Uv... 2022-05-24 President Biden makes an impassioned plea to s... Good evening, fellow Americans. I had hoped, w...
In [11]:
# here we scrape information on president's term and party
# 
# NOTE: Britiania seems to be attempting to block web scrapers. When this happened, a regular requests
# approach will fail. To bypass this, you will need to use  selenium. 
# The following code should work if site is blocking scraper:

# Start a driver session....
# if you hve selenium 4 installed, use one of these:
driver = webdriver.Firefox(service=Service(GeckoDriverManager().install())) # this will work on Windows and Mac, and should work on Linux when run the first time
#driver = webdriver.Firefox() # use if geckodriver is in your PATH environmnet variable (which includes the same folder as your notebook)

driver.get("https://www.britannica.com/topic/Presidents-of-the-United-States-1846696")
driver.implicitly_wait(10)
page_source = driver.page_source
driver.close() 
In [12]:
# pandas read html will parse the contents of the table in the downloaded webpage
presidents = pd.read_html(page_source)[0]
presidents
Out[12]:
Unnamed: 0 no. president birthplace political party term
0 NaN 1 George Washington Va. Federalist 1789–97
1 NaN 2 John Adams Mass. Federalist 1797–1801
2 NaN 3 Thomas Jefferson Va. Democratic-Republican 1801–09
3 NaN 4 James Madison Va. Democratic-Republican 1809–17
4 NaN 5 James Monroe Va. Democratic-Republican 1817–25
5 NaN 6 John Quincy Adams Mass. National Republican 1825–29
6 NaN 7 Andrew Jackson S.C. Democratic 1829–37
7 NaN 8 Martin Van Buren N.Y. Democratic 1837–41
8 NaN 9 William Henry Harrison Va. Whig 1841*
9 NaN 10 John Tyler Va. Whig 1841–45
10 NaN 11 James K. Polk N.C. Democratic 1845–49
11 NaN 12 Zachary Taylor Va. Whig 1849–50*
12 NaN 13 Millard Fillmore N.Y. Whig 1850–53
13 NaN 14 Franklin Pierce N.H. Democratic 1853–57
14 NaN 15 James Buchanan Pa. Democratic 1857–61
15 NaN 16 Abraham Lincoln Ky. Republican 1861–65*
16 NaN 17 Andrew Johnson N.C. Democratic (Union) 1865–69
17 NaN 18 Ulysses S. Grant Ohio Republican 1869–77
18 NaN 19 Rutherford B. Hayes Ohio Republican 1877–81
19 NaN 20 James A. Garfield Ohio Republican 1881*
20 NaN 21 Chester A. Arthur Vt. Republican 1881–85
21 NaN 22 Grover Cleveland N.J. Democratic 1885–89
22 NaN 23 Benjamin Harrison Ohio Republican 1889–93
23 NaN 24 Grover Cleveland N.J. Democratic 1893–97
24 NaN 25 William McKinley Ohio Republican 1897–1901*
25 NaN 26 Theodore Roosevelt N.Y. Republican 1901–09
26 NaN 27 William Howard Taft Ohio Republican 1909–13
27 NaN 28 Woodrow Wilson Va. Democratic 1913–21
28 NaN 29 Warren G. Harding Ohio Republican 1921–23*
29 NaN 30 Calvin Coolidge Vt. Republican 1923–29
30 NaN 31 Herbert Hoover Iowa Republican 1929–33
31 NaN 32 Franklin D. Roosevelt N.Y. Democratic 1933–45*
32 NaN 33 Harry S. Truman Mo. Democratic 1945–53
33 NaN 34 Dwight D. Eisenhower Texas Republican 1953–61
34 NaN 35 John F. Kennedy Mass. Democratic 1961–63*
35 NaN 36 Lyndon B. Johnson Texas Democratic 1963–69
36 NaN 37 Richard M. Nixon Calif. Republican 1969–74**
37 NaN 38 Gerald R. Ford Neb. Republican 1974–77
38 NaN 39 Jimmy Carter Ga. Democratic 1977–81
39 NaN 40 Ronald Reagan Ill. Republican 1981–89
40 NaN 41 George Bush Mass. Republican 1989–93
41 NaN 42 Bill Clinton Ark. Democratic 1993–2001
42 NaN 43 George W. Bush Conn. Republican 2001–09
43 NaN 44 Barack Obama Hawaii Democratic 2009–17
44 NaN 45 Donald Trump N.Y. Republican 2017–21
45 NaN 46 Joe Biden Pa. Democratic 2021–
46 *Died in office. *Died in office. *Died in office. *Died in office. *Died in office. *Died in office.
47 **Resigned from office. **Resigned from office. **Resigned from office. **Resigned from office. **Resigned from office. **Resigned from office.
In [13]:
# note that the last two rows contains non-presidential information
# let's remove these last two rows...
presidents = presidents.drop([int(len(presidents)-1), int(len(presidents)-2)])
presidents
Out[13]:
Unnamed: 0 no. president birthplace political party term
0 NaN 1 George Washington Va. Federalist 1789–97
1 NaN 2 John Adams Mass. Federalist 1797–1801
2 NaN 3 Thomas Jefferson Va. Democratic-Republican 1801–09
3 NaN 4 James Madison Va. Democratic-Republican 1809–17
4 NaN 5 James Monroe Va. Democratic-Republican 1817–25
5 NaN 6 John Quincy Adams Mass. National Republican 1825–29
6 NaN 7 Andrew Jackson S.C. Democratic 1829–37
7 NaN 8 Martin Van Buren N.Y. Democratic 1837–41
8 NaN 9 William Henry Harrison Va. Whig 1841*
9 NaN 10 John Tyler Va. Whig 1841–45
10 NaN 11 James K. Polk N.C. Democratic 1845–49
11 NaN 12 Zachary Taylor Va. Whig 1849–50*
12 NaN 13 Millard Fillmore N.Y. Whig 1850–53
13 NaN 14 Franklin Pierce N.H. Democratic 1853–57
14 NaN 15 James Buchanan Pa. Democratic 1857–61
15 NaN 16 Abraham Lincoln Ky. Republican 1861–65*
16 NaN 17 Andrew Johnson N.C. Democratic (Union) 1865–69
17 NaN 18 Ulysses S. Grant Ohio Republican 1869–77
18 NaN 19 Rutherford B. Hayes Ohio Republican 1877–81
19 NaN 20 James A. Garfield Ohio Republican 1881*
20 NaN 21 Chester A. Arthur Vt. Republican 1881–85
21 NaN 22 Grover Cleveland N.J. Democratic 1885–89
22 NaN 23 Benjamin Harrison Ohio Republican 1889–93
23 NaN 24 Grover Cleveland N.J. Democratic 1893–97
24 NaN 25 William McKinley Ohio Republican 1897–1901*
25 NaN 26 Theodore Roosevelt N.Y. Republican 1901–09
26 NaN 27 William Howard Taft Ohio Republican 1909–13
27 NaN 28 Woodrow Wilson Va. Democratic 1913–21
28 NaN 29 Warren G. Harding Ohio Republican 1921–23*
29 NaN 30 Calvin Coolidge Vt. Republican 1923–29
30 NaN 31 Herbert Hoover Iowa Republican 1929–33
31 NaN 32 Franklin D. Roosevelt N.Y. Democratic 1933–45*
32 NaN 33 Harry S. Truman Mo. Democratic 1945–53
33 NaN 34 Dwight D. Eisenhower Texas Republican 1953–61
34 NaN 35 John F. Kennedy Mass. Democratic 1961–63*
35 NaN 36 Lyndon B. Johnson Texas Democratic 1963–69
36 NaN 37 Richard M. Nixon Calif. Republican 1969–74**
37 NaN 38 Gerald R. Ford Neb. Republican 1974–77
38 NaN 39 Jimmy Carter Ga. Democratic 1977–81
39 NaN 40 Ronald Reagan Ill. Republican 1981–89
40 NaN 41 George Bush Mass. Republican 1989–93
41 NaN 42 Bill Clinton Ark. Democratic 1993–2001
42 NaN 43 George W. Bush Conn. Republican 2001–09
43 NaN 44 Barack Obama Hawaii Democratic 2009–17
44 NaN 45 Donald Trump N.Y. Republican 2017–21
45 NaN 46 Joe Biden Pa. Democratic 2021–
In [14]:
# first, split the string in the term column using dash as delimiter - store this in new column called 'from'
presidents['start_date'] = presidents['term'].apply(lambda x: dateparser.parse(x.split("–")[0]).year)

# calculate 'to' based on the content of the term string
def to_year(row):    
    row['term'] = re.sub(r"[^\d-]", "", row['term']) # replace any non-digit before dash with blank
    term_list = row['term'].split("–") # split on dash (to get start and end year)
    if  len(term_list)== 1: # if we only have one date, then this is both from and to
        return row['start_date']
    elif len(term_list) == 2:
        return row['start_date'][:2] + term_list[1] # return first two digits of from with string in to field
    else:
        return "bad data"
    return row
    
presidents['end_date'] = presidents.apply(lambda row: to_year(row), axis=1)

presidents
Out[14]:
Unnamed: 0 no. president birthplace political party term start_date end_date
0 NaN 1 George Washington Va. Federalist 1789–97 1789 1789
1 NaN 2 John Adams Mass. Federalist 1797–1801 1797 1797
2 NaN 3 Thomas Jefferson Va. Democratic-Republican 1801–09 1801 1801
3 NaN 4 James Madison Va. Democratic-Republican 1809–17 1809 1809
4 NaN 5 James Monroe Va. Democratic-Republican 1817–25 1817 1817
5 NaN 6 John Quincy Adams Mass. National Republican 1825–29 1825 1825
6 NaN 7 Andrew Jackson S.C. Democratic 1829–37 1829 1829
7 NaN 8 Martin Van Buren N.Y. Democratic 1837–41 1837 1837
8 NaN 9 William Henry Harrison Va. Whig 1841* 1841 1841
9 NaN 10 John Tyler Va. Whig 1841–45 1841 1841
10 NaN 11 James K. Polk N.C. Democratic 1845–49 1845 1845
11 NaN 12 Zachary Taylor Va. Whig 1849–50* 1849 1849
12 NaN 13 Millard Fillmore N.Y. Whig 1850–53 1850 1850
13 NaN 14 Franklin Pierce N.H. Democratic 1853–57 1853 1853
14 NaN 15 James Buchanan Pa. Democratic 1857–61 1857 1857
15 NaN 16 Abraham Lincoln Ky. Republican 1861–65* 1861 1861
16 NaN 17 Andrew Johnson N.C. Democratic (Union) 1865–69 1865 1865
17 NaN 18 Ulysses S. Grant Ohio Republican 1869–77 1869 1869
18 NaN 19 Rutherford B. Hayes Ohio Republican 1877–81 1877 1877
19 NaN 20 James A. Garfield Ohio Republican 1881* 1881 1881
20 NaN 21 Chester A. Arthur Vt. Republican 1881–85 1881 1881
21 NaN 22 Grover Cleveland N.J. Democratic 1885–89 1885 1885
22 NaN 23 Benjamin Harrison Ohio Republican 1889–93 1889 1889
23 NaN 24 Grover Cleveland N.J. Democratic 1893–97 1893 1893
24 NaN 25 William McKinley Ohio Republican 1897–1901* 1897 1897
25 NaN 26 Theodore Roosevelt N.Y. Republican 1901–09 1901 1901
26 NaN 27 William Howard Taft Ohio Republican 1909–13 1909 1909
27 NaN 28 Woodrow Wilson Va. Democratic 1913–21 1913 1913
28 NaN 29 Warren G. Harding Ohio Republican 1921–23* 1921 1921
29 NaN 30 Calvin Coolidge Vt. Republican 1923–29 1923 1923
30 NaN 31 Herbert Hoover Iowa Republican 1929–33 1929 1929
31 NaN 32 Franklin D. Roosevelt N.Y. Democratic 1933–45* 1933 1933
32 NaN 33 Harry S. Truman Mo. Democratic 1945–53 1945 1945
33 NaN 34 Dwight D. Eisenhower Texas Republican 1953–61 1953 1953
34 NaN 35 John F. Kennedy Mass. Democratic 1961–63* 1961 1961
35 NaN 36 Lyndon B. Johnson Texas Democratic 1963–69 1963 1963
36 NaN 37 Richard M. Nixon Calif. Republican 1969–74** 1969 1969
37 NaN 38 Gerald R. Ford Neb. Republican 1974–77 1974 1974
38 NaN 39 Jimmy Carter Ga. Democratic 1977–81 1977 1977
39 NaN 40 Ronald Reagan Ill. Republican 1981–89 1981 1981
40 NaN 41 George Bush Mass. Republican 1989–93 1989 1989
41 NaN 42 Bill Clinton Ark. Democratic 1993–2001 1993 1993
42 NaN 43 George W. Bush Conn. Republican 2001–09 2001 2001
43 NaN 44 Barack Obama Hawaii Democratic 2009–17 2009 2009
44 NaN 45 Donald Trump N.Y. Republican 2017–21 2017 2017
45 NaN 46 Joe Biden Pa. Democratic 2021– 2021 2021
In [15]:
#C:\Users\rmura\Murali\Text Analytics\week3\data
presidents.to_csv("./data/presidential_party_and_term.csv", index=False)
In [16]:
speeches = pd.read_csv("./data/presidential_speeches.csv")
parties = pd.read_csv("./data/presidential_party_and_term.csv")

# change the column name of the presidents dataframe to match the speeches dataframe
# In the presidents dataframe, the column name is 'president'. In the speeches dataframe, the column name is 'name'
parties = parties.rename(columns={'president':'name'})
In [17]:
speeches.head()
Out[17]:
name title date about speech
0 Joe Biden February 21, 2023: Remarks on the One-Year Ann... 2023-02-21 Speaking at the Royal Castle in Warsaw, Poland... THE PRESIDENT: Hello, Poland! One of our great...
1 Joe Biden February 7, 2023: State of the Union Address 2023-02-07 In his State of the Union Address, President J... Mr. Speaker. Madam Vice President. Our Firs...
2 Joe Biden September 21, 2022: Speech before the 77th Ses... 2022-09-21 President Joe Biden addresses the 77th session... Thank you. Mr. President, Mr. Secretary-Gene...
3 Joe Biden September 1, 2022: Remarks on the Continued Ba... 2022-09-01 President Joe Biden speaks in Philadelphia, Pe... THE PRESIDENT: My fellow Americans, please, if...
4 Joe Biden May 24, 2022: Remarks on School Shooting in Uv... 2022-05-24 President Biden makes an impassioned plea to s... Good evening, fellow Americans. I had hoped, w...
In [18]:
parties.head()
Out[18]:
Unnamed: 0 no. name birthplace political party term start_date end_date
0 NaN 1 George Washington Va. Federalist 1789–97 1789 1789
1 NaN 2 John Adams Mass. Federalist 1797–1801 1797 1797
2 NaN 3 Thomas Jefferson Va. Democratic-Republican 1801–09 1801 1801
3 NaN 4 James Madison Va. Democratic-Republican 1809–17 1809 1809
4 NaN 5 James Monroe Va. Democratic-Republican 1817–25 1817 1817
In [19]:
import difflib # this will provide us with a 'fuzzy' match between the presidential names found in each table

#convert name in party to name it most closely matches in speeches
parties['name'] = parties['name'].apply(lambda x: difflib.get_close_matches(x, speeches['name'])[0])
In [33]:
# merge the DataFrames into one
merged = speeches.merge(parties)

# view final DataFrame
merged
Out[33]:
name title date about speech Unnamed: 0 no. birthplace political party term start_date end_date
0 Joe Biden February 21, 2023: Remarks on the One-Year Ann... 2023-02-21 Speaking at the Royal Castle in Warsaw, Poland... THE PRESIDENT: Hello, Poland! One of our great... NaN 46 Pa. Democratic 2021– 2021 2021
1 Joe Biden February 7, 2023: State of the Union Address 2023-02-07 In his State of the Union Address, President J... Mr. Speaker. Madam Vice President. Our Firs... NaN 46 Pa. Democratic 2021– 2021 2021
2 Joe Biden September 21, 2022: Speech before the 77th Ses... 2022-09-21 President Joe Biden addresses the 77th session... Thank you. Mr. President, Mr. Secretary-Gene... NaN 46 Pa. Democratic 2021– 2021 2021
3 Joe Biden September 1, 2022: Remarks on the Continued Ba... 2022-09-01 President Joe Biden speaks in Philadelphia, Pe... THE PRESIDENT: My fellow Americans, please, if... NaN 46 Pa. Democratic 2021– 2021 2021
4 Joe Biden May 24, 2022: Remarks on School Shooting in Uv... 2022-05-24 President Biden makes an impassioned plea to s... Good evening, fellow Americans. I had hoped, w... NaN 46 Pa. Democratic 2021– 2021 2021
... ... ... ... ... ... ... ... ... ... ... ... ...
1086 George Washington December 29, 1790: Talk to the Chiefs and Coun... 1790-12-29 The President reassures the Seneca Nation that... I the President of the United States, by my... NaN 1 Va. Federalist 1789–97 1789 1789
1087 George Washington December 8, 1790: Second Annual Message to Con... 1790-12-08 Washington focuses on commerce in his second a... Fellow citizens of the Senate and House of ... NaN 1 Va. Federalist 1789–97 1789 1789
1088 George Washington January 8, 1790: First Annual Message to Congress 1790-01-08 In a wide-ranging speech, President Washington... Fellow Citizens of the Senate and House of R... NaN 1 Va. Federalist 1789–97 1789 1789
1089 George Washington October 3, 1789: Thanksgiving Proclamation 1789-10-03 At the request of Congress, Washington establi... Whereas it is the duty of all Nations to ack... NaN 1 Va. Federalist 1789–97 1789 1789
1090 George Washington April 30, 1789: First Inaugural Address 1789-04-30 President George Washington calls on Congress ... Fellow Citizens of the Senate and the House ... NaN 1 Va. Federalist 1789–97 1789 1789

1091 rows × 12 columns

In [23]:
merged.isna().sum()
Out[23]:
name                  0
title                 0
date                  0
about                 1
speech                0
Unnamed: 0         1091
no.                   0
birthplace            0
political party       0
term                  0
start_date            0
end_date              0
dtype: int64

removing unwanted columns like nan¶

In [34]:
merged.columns
Out[34]:
Index(['name', 'title', 'date', 'about', 'speech', 'Unnamed: 0', 'no.',
       'birthplace', 'political party', 'term', 'start_date', 'end_date'],
      dtype='object')
In [36]:
merged.drop('Unnamed: 0',axis=1,inplace=True)
In [37]:
merged.to_csv("./data/presidential_speeches_merged.csv",index=False)
In [38]:
merged['political party'].value_counts()
Out[38]:
political party
Democratic               497
Republican               439
Democratic-Republican     56
Democratic (Union)        31
Whig                      30
Federalist                30
National Republican        8
Name: count, dtype: int64
  1. Which president has the most vocabulary, as evident from their inaugural speeches, and which president has the least vocabulary? On average, do Democratic, Republican, or Other presidents have a higher vocabulary? (2 points)
In [3]:
speech_data_pre=pd.read_csv("./data/presidential_speeches_merged.csv")
speech_data_pre.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1091 entries, 0 to 1090
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   name             1091 non-null   object
 1   title            1091 non-null   object
 2   date             1091 non-null   object
 3   about            1090 non-null   object
 4   speech           1091 non-null   object
 5   no.              1091 non-null   int64 
 6   birthplace       1091 non-null   object
 7   political party  1091 non-null   object
 8   term             1091 non-null   object
 9   start_date       1091 non-null   int64 
 10  end_date         1091 non-null   int64 
dtypes: int64(3), object(8)
memory usage: 93.9+ KB
In [ ]:
 
In [3]:
# converting  each raw speech of every us president
toke=[]
for x in speech_data_pre.speech:
    
    # now use nltk natural langugae tool kit to text rto tokens
    tokens=nltk.word_tokenize(x)

    # remove all tokens that are not alphabetic
    tokens = [word for word in tokens if word.isalpha()]

    # make lowercase
    tokens = [word.lower() for word in tokens]

    # remove all tokens that are only one character
    tokens = [word for word in tokens if len(word) > 1]

    # remove stopwords
    stop_words = stopwords.words('english')
    tokens = [word for word in tokens if word not in stop_words]

    # lemmatize words(Lemmatization is a text normalization technique - it is a process of converting words to their base forms)
    # nltk.download('wordnet') # uncomment if you need to download the wordnet package
    lemmatizer = WordNetLemmatizer()
    tokens = [lemmatizer.lemmatize(word) for word in tokens]

    toke.append(tokens)

speech_data_pre['tokens']=toke
In [39]:
speech_data_pre.to_csv("./data/Final_presidential_speeches.csv")
speech_data_pre.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1091 entries, 0 to 1090
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   name             1091 non-null   object
 1   title            1091 non-null   object
 2   date             1091 non-null   object
 3   about            1090 non-null   object
 4   speech           1091 non-null   object
 5   no.              1091 non-null   int64 
 6   birthplace       1091 non-null   object
 7   political party  1091 non-null   object
 8   term             1091 non-null   object
 9   start_date       1091 non-null   int64 
 10  end_date         1091 non-null   int64 
 11  tokens           1091 non-null   object
 12  wordCount        1091 non-null   int64 
dtypes: int64(4), object(9)
memory usage: 110.9+ KB
In [4]:
speech_data_pre.head()
Out[4]:
name title date about speech no. birthplace political party term start_date end_date tokens
0 Joe Biden February 21, 2023: Remarks on the One-Year Ann... 2023-02-21 Speaking at the Royal Castle in Warsaw, Poland... THE PRESIDENT: Hello, Poland! One of our great... 46 Pa. Democratic 2021– 2021 2021 [president, hello, poland, one, great, ally, p...
1 Joe Biden February 7, 2023: State of the Union Address 2023-02-07 In his State of the Union Address, President J... Mr. Speaker. Madam Vice President. Our Firs... 46 Pa. Democratic 2021– 2021 2021 [speaker, madam, vice, president, first, lady,...
2 Joe Biden September 21, 2022: Speech before the 77th Ses... 2022-09-21 President Joe Biden addresses the 77th session... Thank you. Mr. President, Mr. Secretary-Gene... 46 Pa. Democratic 2021– 2021 2021 [thank, president, fellow, leader, last, year,...
3 Joe Biden September 1, 2022: Remarks on the Continued Ba... 2022-09-01 President Joe Biden speaks in Philadelphia, Pe... THE PRESIDENT: My fellow Americans, please, if... 46 Pa. Democratic 2021– 2021 2021 [president, fellow, american, please, seat, ta...
4 Joe Biden May 24, 2022: Remarks on School Shooting in Uv... 2022-05-24 President Biden makes an impassioned plea to s... Good evening, fellow Americans. I had hoped, w... 46 Pa. Democratic 2021– 2021 2021 [good, evening, fellow, american, hoped, becam...
In [5]:
len(speech_data_pre.tokens[1])

wordCount_eachToken=list(map(lambda x:len(x),speech_data_pre.tokens))
In [6]:
speech_data_pre.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1091 entries, 0 to 1090
Data columns (total 12 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   name             1091 non-null   object
 1   title            1091 non-null   object
 2   date             1091 non-null   object
 3   about            1090 non-null   object
 4   speech           1091 non-null   object
 5   no.              1091 non-null   int64 
 6   birthplace       1091 non-null   object
 7   political party  1091 non-null   object
 8   term             1091 non-null   object
 9   start_date       1091 non-null   int64 
 10  end_date         1091 non-null   int64 
 11  tokens           1091 non-null   object
dtypes: int64(3), object(9)
memory usage: 102.4+ KB
In [7]:
speech_data_pre['wordCount']=wordCount_eachToken
In [12]:
speech_data_pre['wordCount']
Out[12]:
0       1347
1       3765
2       1879
3       1301
4        376
        ... 
1086     631
1087     631
1088     393
1089     201
1090     645
Name: wordCount, Length: 1091, dtype: int64

max speech of presidenet¶

In [8]:
speech_data_pre.count()
Out[8]:
name               1091
title              1091
date               1091
about              1090
speech             1091
no.                1091
birthplace         1091
political party    1091
term               1091
start_date         1091
end_date           1091
tokens             1091
wordCount          1091
dtype: int64
In [14]:
speech_data_pre.groupby('name').wordCount.agg('sum')
Out[14]:
144416
In [11]:
max_value=speech_data_pre[['name','wordCount']].iloc[speech_data_pre['wordCount'].idxmax()]

print("Max Speech president is ",max_value)

max_value=speech_data_pre[['name','wordCount']].iloc[speech_data_pre['wordCount'].idxmin()]

print("Min Speech President is ",max_value)
Max Speech president is  name         Abraham Lincoln
wordCount              14614
Name: 875, dtype: object
Min Speech President is  name         George Washington
wordCount                   58
Name: 1081, dtype: object
In [90]:
speech_data_pre.groupby('name').wordCount.agg('sum')
Out[90]:
'DemocraticRepublicanDemocratic (Union)WhigNational RepublicanDemocratic-RepublicanFederalist'
In [16]:
speech_data_pre.groupby(['political party']).wordCount.agg('mean')
Out[16]:
political party
Democratic               1864.635815
Democratic (Union)       1474.064516
Democratic-Republican    1026.535714
Federalist                718.433333
National Republican      2075.750000
Republican               2069.102506
Whig                     1984.900000
Name: wordCount, dtype: float64

from above we say that avaerage of all parties¶

  1. Create a barplot of presidential vocabulary from the earliest president (Washington) to the latest (Biden) in chronological order. Color code this barplot as blue for Democrat, red for Republican, and gray for Others. (1 point)
In [24]:
col_party={'Democratic':"blue",'Republican':"red",'other':"gray"}



party_colors = speech_data_pre['political party'].map(col_party).fillna("gray")
# Create the barplot
plt.figure(figsize=(12, 6))
plt.barh(speech_data_pre['name'], speech_data_pre['wordCount'], color=party_colors)
plt.xlabel("Presidents")
plt.ylabel("Vocabulary")
plt.title("Presidential Vocabulary from Washington to Biden")
plt.xticks(rotation=90)
plt.tight_layout()

# Show the plot or save it to a file
plt.show()

#In this code, we first define the list of presidents, their party affiliations, and some random data representing their vocabulary. We then create a list of colors based on party affiliation and use Matplotlib to create the barplot, coloring the bars accordingly. Finally, we display the plot using plt.show(). You can customize the data and labels as needed for your specific analysis.

blue is Democratic, red is Republican, gray for Others, from graph easly we can say that red is more and length or count is more for republican¶

  1. What are the five most frequently used words (exclusive of stop words) used by each president? What are the five most frequently words used collectively by all Democratic presidents versus Republican presidents? (2 point)
In [51]:
top_words_by_president = {}
for president, tokens in speech_data_pre.groupby("name")['tokens']:
    # Combine all speeches for the president into a single text
    
    all_speeches = [token for token_list in tokens for token in token_list]
    
    
    word_freq = Counter(all_speeches)
    top_words = word_freq.most_common(5)
    
    # Store the results in the dictionary
    top_words_by_president[president] = top_words

# Print the top five words for each president
for president, top_words in top_words_by_president.items():
    print(f"Top 5 words for {president}:")
    for word, freq in top_words:
        print(f"{word}: {freq}")
    print()
Top 5 words for Abraham Lincoln:
state: 609
slavery: 411
would: 333
slave: 318
one: 302

Top 5 words for Andrew Jackson:
state: 1332
government: 876
power: 596
may: 558
united: 556

Top 5 words for Andrew Johnson:
state: 1341
united: 473
government: 411
law: 391
constitution: 382

Top 5 words for Barack Obama:
applause: 1324
people: 928
american: 813
u: 691
year: 689

Top 5 words for Benjamin Harrison:
state: 843
government: 667
upon: 643
year: 498
united: 479

Top 5 words for Bill Clinton:
people: 1009
american: 775
year: 661
must: 558
america: 526

Top 5 words for Calvin Coolidge:
government: 384
country: 255
made: 216
people: 210
would: 196

Top 5 words for Chester A. Arthur:
state: 264
government: 231
year: 180
congress: 177
united: 171

Top 5 words for Donald Trump:
people: 1210
president: 1195
going: 931
know: 843
want: 834

Top 5 words for Dwight D. Eisenhower:
nation: 375
world: 311
must: 307
people: 267
year: 235

Top 5 words for Franklin D. Roosevelt:
people: 573
war: 511
nation: 503
government: 471
american: 460

Top 5 words for Franklin Pierce:
state: 660
government: 323
united: 320
power: 234
congress: 197

Top 5 words for George W. Bush:
america: 1052
people: 1006
american: 882
nation: 794
world: 696

Top 5 words for George Washington:
state: 212
united: 157
may: 135
government: 119
nation: 100

Top 5 words for Gerald Ford:
american: 202
state: 184
people: 170
congress: 159
nation: 158

Top 5 words for Grover Cleveland:
government: 1532
state: 1270
year: 1232
upon: 1096
united: 892

Top 5 words for Harry S. Truman:
world: 215
people: 214
nation: 151
united: 138
would: 125

Top 5 words for Herbert Hoover:
government: 469
upon: 342
state: 338
people: 308
year: 283

Top 5 words for James A. Garfield:
government: 21
people: 20
constitution: 17
law: 15
upon: 13

Top 5 words for James Buchanan:
state: 702
government: 445
would: 349
congress: 337
constitution: 296

Top 5 words for James K. Polk:
state: 792
government: 573
mexico: 491
united: 462
war: 398

Top 5 words for James Madison:
state: 244
united: 184
war: 147
government: 117
public: 117

Top 5 words for James Monroe:
state: 355
government: 248
united: 213
great: 210
power: 165

Top 5 words for Jimmy Carter:
president: 576
people: 502
would: 468
year: 420
country: 348

Top 5 words for Joe Biden:
american: 459
people: 407
president: 369
year: 293
america: 290

Top 5 words for John Adams:
state: 117
united: 90
nation: 63
government: 59
country: 55

Top 5 words for John F. Kennedy:
world: 558
state: 530
nation: 520
would: 499
country: 483

Top 5 words for John Quincy Adams:
state: 220
upon: 193
united: 147
year: 143
congress: 140

Top 5 words for John Tyler:
state: 574
government: 384
united: 257
would: 237
may: 222

Top 5 words for Lyndon B. Johnson:
president: 1378
people: 1001
would: 929
year: 872
think: 857

Top 5 words for Martin Van Buren:
government: 402
state: 400
public: 273
upon: 223
bank: 199

Top 5 words for Millard Fillmore:
state: 347
united: 192
government: 166
law: 166
congress: 140

Top 5 words for Richard M. Nixon:
american: 314
year: 299
peace: 287
people: 270
would: 229

Top 5 words for Ronald Reagan:
people: 915
u: 805
year: 773
government: 718
american: 686

Top 5 words for Rutherford B. Hayes:
state: 507
government: 343
united: 327
congress: 265
law: 259

Top 5 words for Theodore Roosevelt:
state: 845
government: 703
law: 613
united: 504
would: 480

Top 5 words for Thomas Jefferson:
state: 167
may: 162
u: 143
shall: 128
nation: 102

Top 5 words for Ulysses S. Grant:
state: 955
united: 620
government: 475
congress: 361
year: 315

Top 5 words for Warren G. Harding:
world: 176
american: 144
government: 126
must: 100
republic: 100

Top 5 words for William Harrison:
power: 63
government: 44
state: 40
constitution: 37
people: 37

Top 5 words for William McKinley:
government: 614
state: 548
united: 402
congress: 277
upon: 258

Top 5 words for William Taft:
government: 604
state: 546
country: 349
law: 346
united: 343

Top 5 words for Woodrow Wilson:
upon: 386
government: 343
nation: 290
people: 279
must: 270

Top 5 words for Zachary Taylor:
state: 100
congress: 60
government: 58
united: 43
treaty: 42

top words of presidents in list¶

In [54]:
top_words_by_president = {}
specific_values = ['Democratic', 'Republican']
filtered_df = speech_data_pre[speech_data_pre['political party'].isin(specific_values)]
for president, tokens in filtered_df.groupby("political party")['tokens']:
    # Combine all speeches for the president into a single text
    
    all_speeches = [token for token_list in tokens for token in token_list]
    
    
    word_freq = Counter(all_speeches)
    top_words = word_freq.most_common(5)
    
    # Store the results in the dictionary
    top_words_by_president[president] = top_words

# Print the top five words for each president
for president, top_words in top_words_by_president.items():
    print(f"Top 5 words for {president}:")
    for word, freq in top_words:
        print(f"{word}: {freq}")
    print()
Top 5 words for Democratic:
state: 8077
people: 6985
government: 6542
year: 5895
would: 5100

Top 5 words for Republican:
state: 7787
government: 6732
people: 6191
year: 5337
american: 4737

from above result we can say that democratic has high top values count comparing with republic¶

come word in bewtween these parties are state, people, givernmnet,year,would¶

  1. What are the key themes (e.g., freedom, liberty, country, etc.) used by each president in their inaugural speech? (3 points)
In [57]:
for president, tokens in speech_data_pre.groupby("name")['tokens']:
    # Combine all speeches for the president into a single text
    
    all_speeches = [token for token_list in tokens for token in token_list]
    # Create n-grams using NLTK
    # n-grams is a way of preserving sequence (and help with meaning) of words

    bigrams = list(ngrams(all_speeches, 2))    # create a list of bigrams (note that the output is a list of tuples)

    # print the first 10 bigrams
    # create a dictionary of bigrams and their counts
    bigram_dict = {}
    for bigram in bigrams:              # iterate through the list of bigrams
        bigram_str = ' '.join(bigram)   # convert the bigram tuple to string
        bigram_dict[bigram_str] = bigram_dict.get(bigram_str, 0) + 1 # add bigram to dictionary if not exist and set value to 1, otherwise increment existing bigram count by 1
        
    # create a word cloud of bigrams
    from wordcloud import WordCloud
    wordcloud = WordCloud(
        width=1000, 
        height=1000, 
        background_color='white', 
        collocations='FALSE', 
        min_font_size=16)

    wordcloud.generate_from_frequencies(bigram_dict)
    plt.figure(figsize = (7,7))
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.title(president)
    plt.show()

above wordcloud is all president data represented in word cloud¶

  1. Compute a sentiment (positive/negative) for each presidential speech, and draw a barplot of the sentiment of all presidential speeches in chronological order. Again, color code the speeches as blue for Democrat, red for Republican, and gray for Other. Which of these groups have higher mean sentiment score? Who are the top three presidents with the highest positive sentiment in each group? (2 points)
In [106]:
nltk.download("vader_lexicon")
sia = SentimentIntensityAnalyzer()

dfs = []

# Loop through groups
for president, group in speech_data_pre.groupby("name"):
    # Combine all speeches for the president into a single text
    all_speeches = " ".join(token for token_list in group['tokens'] for token in token_list)

    # Perform sentiment analysis
    Sentiment = sia.polarity_scores(all_speeches)

    # Get the political party for the president
    party = group['political party'].iloc[0]  # Assuming each president has one political party

    # Create a DataFrame with the results
    result_df = pd.DataFrame({'president name': [president], 'Sentiment score': [Sentiment["compound"]], 'political party': [party]})

    # Append the DataFrame to the list
    dfs.append(result_df)

# Concatenate all DataFrames in the list into one DataFrame
concat_speech = pd.concat(dfs, ignore_index=True)

# Display the resulting DataFrame
print(concat_speech)
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\rmura\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
           president name  Sentiment score        political party
0         Abraham Lincoln           1.0000             Republican
1          Andrew Jackson           1.0000             Democratic
2          Andrew Johnson           1.0000     Democratic (Union)
3            Barack Obama           1.0000             Democratic
4       Benjamin Harrison           1.0000             Republican
5            Bill Clinton           1.0000             Democratic
6         Calvin Coolidge           1.0000             Republican
7       Chester A. Arthur           1.0000             Republican
8            Donald Trump           1.0000             Republican
9    Dwight D. Eisenhower           1.0000             Republican
10  Franklin D. Roosevelt           1.0000             Democratic
11        Franklin Pierce           1.0000             Democratic
12         George W. Bush           1.0000             Republican
13      George Washington           1.0000             Federalist
14            Gerald Ford           1.0000             Republican
15       Grover Cleveland           1.0000             Democratic
16        Harry S. Truman           1.0000             Democratic
17         Herbert Hoover           1.0000             Republican
18      James A. Garfield           0.9998             Republican
19         James Buchanan           1.0000             Democratic
20          James K. Polk           1.0000             Democratic
21          James Madison           1.0000  Democratic-Republican
22           James Monroe           1.0000  Democratic-Republican
23           Jimmy Carter           1.0000             Democratic
24              Joe Biden           1.0000             Democratic
25             John Adams           1.0000             Federalist
26        John F. Kennedy           1.0000             Democratic
27      John Quincy Adams           1.0000    National Republican
28             John Tyler           1.0000                   Whig
29      Lyndon B. Johnson           1.0000             Democratic
30       Martin Van Buren           1.0000             Democratic
31       Millard Fillmore           1.0000                   Whig
32       Richard M. Nixon           1.0000             Republican
33          Ronald Reagan           1.0000             Republican
34    Rutherford B. Hayes           1.0000             Republican
35     Theodore Roosevelt           1.0000             Republican
36       Thomas Jefferson           1.0000  Democratic-Republican
37       Ulysses S. Grant           1.0000             Republican
38      Warren G. Harding           1.0000             Republican
39       William Harrison           1.0000                   Whig
40       William McKinley           1.0000             Republican
41           William Taft           1.0000             Republican
42         Woodrow Wilson           1.0000             Democratic
43         Zachary Taylor           1.0000                   Whig
In [113]:
party_colors = {"Democratic": "blue", "Republican": "red", "O": "gray"}
party_color = concat_speech['political party'].map(col_party).fillna("gray")

# Create the barplot
plt.figure(figsize=(12, 6))
plt.bar(concat_speech.index, concat_speech["Sentiment score"], color=party_color)
plt.xlabel("Speeches (Chronological Order)")
plt.ylabel("Sentiment Score")
plt.title("Sentiment of Presidential Speeches")
plt.xticks(concat_speech.index, concat_speech["president name"], rotation=45)
plt.tight_layout()

# Show the plot
plt.show()
In [118]:
concat_speech.groupby('political party')["Sentiment score"].mean()
Out[118]:
political party
Democratic               1.000000
Democratic (Union)       1.000000
Democratic-Republican    1.000000
Federalist               1.000000
National Republican      1.000000
Republican               0.999989
Whig                     1.000000
Name: Sentiment score, dtype: float64

based on above results we can say that all parties have max mean .. expect republican¶

In [123]:
grouped_by_party = concat_speech.groupby('political party')

# Create a dictionary to store the top three presidents for each group
top_presidents_by_party = {}

# Find the top three presidents with the highest positive sentiment in each group
for party, group in grouped_by_party:
    # Sort the group by 'Sentiment score' in descending order and take the top three
    top_presidents = group.nlargest(3, 'Sentiment score')
    
    # Store the top presidents in the dictionary
    top_presidents_by_party[party] = top_presidents

# Print the top three presidents for each group
for party, top_presidents in top_presidents_by_party.items():
    print(f"Top Three Presidents in Party {party}:")
    print(top_presidents)
    print()
Top Three Presidents in Party Democratic:
   president name  Sentiment score political party
1  Andrew Jackson              1.0      Democratic
3    Barack Obama              1.0      Democratic
5    Bill Clinton              1.0      Democratic

Top Three Presidents in Party Democratic (Union):
   president name  Sentiment score     political party
2  Andrew Johnson              1.0  Democratic (Union)

Top Three Presidents in Party Democratic-Republican:
      president name  Sentiment score        political party
21     James Madison              1.0  Democratic-Republican
22      James Monroe              1.0  Democratic-Republican
36  Thomas Jefferson              1.0  Democratic-Republican

Top Three Presidents in Party Federalist:
       president name  Sentiment score political party
13  George Washington              1.0      Federalist
25         John Adams              1.0      Federalist

Top Three Presidents in Party National Republican:
       president name  Sentiment score      political party
27  John Quincy Adams              1.0  National Republican

Top Three Presidents in Party Republican:
      president name  Sentiment score political party
0    Abraham Lincoln              1.0      Republican
4  Benjamin Harrison              1.0      Republican
6    Calvin Coolidge              1.0      Republican

Top Three Presidents in Party Whig:
      president name  Sentiment score political party
28        John Tyler              1.0            Whig
31  Millard Fillmore              1.0            Whig
39  William Harrison              1.0            Whig

In [124]:
concat_speech.to_csv("./data/Sentimental_score_final.csv")

based on above results each party at least 3memebers in each grpup expect National Republican & Democratic (Union)¶

Analysis¶

  • Using selenium packages extracted president & president speech information from internet
  • After data extracted, preprocess setup applied based on our requirements like splitting the date to start and end date
  • Removing nan values in data
  • After preprocessing we applied the merge the dataset based names
  • Saved the file in csv
  • Reading the saved data and applying tokens techniques like stop word, limization, ..
  • Getting max & min word count for presidents
    • Max Speech president is name Abraham Lincoln
    • wordCount 14614
    • Name: 875, dtype: object
    • Min Speech President is name George Washington
    • wordCount 58
  • And each party word count average
  • We applied the graph on each president all speeches
  • Using iteration, we find the coomn word in president and in party groups
  • Implemented each president word cloud based on all presidents
  • Based on the president, party, word count or token we implemented sentiment score using SentimentIntensityAnalyzer()
  • Using sentiment score and party created graph
  • Final we get the each president sentimal score and party name in dataframe and saved to csv